Background

On September 26, 2016 at 11:47 a.m. U.S. Central Daylight Time (16:47 UTC) the Cedar and Wapsipinicon rivers in Iowa surged producing a flood wave that breached the river banks. The water level of the Cedar River measured ~20 feet — 8 feet above flood stage—near the city of Cedar Rapids.

The water level continued to rise until it peaked at ~22 feet on September 27. This event had only been exceeded once, in June 2008, when thousands of people were encouraged to evacuate from Cedar Rapids, the second-most-populous city in Iowa.

In this lab we are interested in the impacts in Palo Iowa because it is up stream of Cedar Rapids, contains a large amount of farm land, and does not have a forecast location to provide warning.

We will use the raster package and our understanding of raster data and categorization to create flood images using mutliband Landsat Imagery, thresholding, and classification methods.

Libraries

library(raster) # Raster Data handling
library(tidyverse) # Data Manipulation
library(getlandsat) # keyless Landsat data (2013-2017)
library(osmdata)
library(sf) # Vector data processing
library(mapview) # Rapid Interactive visualization

Remote Sensing/Image Analysis begins with usually the same steps:

    1. Identifying an area of interest (AOI)
    1. Identifying and downloading the relevant images or products
    1. Analyzing the raster products

Question 1 (AOI Definition)

Using uscities.csv, filter to Palo, Iowa and create a 5 kilometer buffer.

From the Palo, Iowa feature:

This region defines the AOI for this analysis.

bb = read.csv('../data/uscities.csv') %>%
  filter(city == "Palo") %>%
  st_as_sf(coords = c("lng", "lat"), crs = 4326) %>%
  st_transform(5070) %>%
  st_buffer(5000) %>%
  st_bbox() %>%
  st_as_sfc() %>%
  st_as_sf()

Question 2 (Data Acquisition)

2.1 (Retrieve Palo flood data)

For our analysis we will be using data from Landsat 8. Landsat 8 is the newest generation of the Landsat satellites and provides a useful resource for water detection. The OLI sensor aboard Landsat 8 has nine bands for capturing the spectral response of the earth’s surface at discrete wavelengths along the electromagnetic spectrum.

Amazon hosts all Landsat 8 data in a bucket associated with the OpenData Registry Initiative

The getlandsat R package provides a nice interface to this product for images taken between 2013-2017. Current efforts to extend this resource are underway using the SAT-API.

To find our images, we first need to list all available scenes, filter to those that meet our criteria (date and bounding box), and isolate the scenes unique identifier.

To do this:

  • load all available scenes with getlandsat::lsat_scenes() and assign it to an object
    • getlandsat::lsat_scenes() will download and extract a CSV of ALL scenes archived on the AWS bucket between (2013-2017) (n = 2,070,448)
    • While reading in this much data takes some time (~30 seconds), it only needs to be done when you are searching for scenes. That is why it should NOT be included in your Rmd.
  • Once its loaded, filter the scenes to those that are suitable for our AOI on 2016-09-26
    • transform your bbox to EPSG:4326, and create a new bounding box object (st_bbox)
    • Use >= and <= to filter based on your xmin, xmax, ymin, ymax values
    • The acquisitionDate comes as a POSIXct, these include hour:minutes:seconds. Cast it to a date object with as.Date, and check it against (as.Date(“2016-09-26”))
  • Once you have identified the meta data for your scene, save it to a csv file in your data folder using write.csv:
#code is from lab05.R file
bbwgs = bb %>% st_transform(4326)
bb = st_bbox(bbwgs)

osm = osmdata::opq(bbwgs) %>%
  #or building
  add_osm_feature("natural") %>%
  osmdata_sf()

#mapview(osm$osm_polygons)

scenes = lsat_scenes()

down = scenes %>%
  filter(min_lat <= bb$ymin, max_lat >= bb$ymax,
         min_lon <= bb$xmin, max_lon >= bb$xmax,
         as.Date(acquisitionDate) == as.Date("2016-09-26"))

#write.csv(down, file = "data/palo-flood-scene.csv")

2.2 (Caching the Data)

We need to download and cache the data on our computers.

Caching essentially means we will download the data to a standardized location known to the downloading utility.

Before any data is downloaded, the utility will check if it already exisits. If it does, then the path to the file is returned, if it does not, the data is downloaded.

The getlandsat package provides a nice caching system. To download and cache image data we need to do the following:

  • Read in the csv of your image meta data from the file saved in your data directory (read_csv)
  • Pass the download URL variable to lsat_scene_files
    • This will list all available files for that scene
    • We will then filter these to only include the TIF files for bands 1-6.

We can do this using grepl like last week! However we will expand on pattern matching techniques using multiple patterns and a constraint:

  • This time we will search for a set of patterns (B1.TIF, B2.TIF, B3.TIF,… B6.TIF) rather then a single one (like “C” or “R”)
    • This effectively checks for THIS pattern or THAT pattern (THIS|THAT)
    • We can create this pattern with paste and the collapse argument
  • We can search for multiple patterns by separating them with the “OR” operator |
    • Specifically, we want to collapse our patterns on the | separator:
meta = read_csv("../data/palo-flood-scene.csv")

files = lsat_scene_files(meta$download_url) %>%
  filter(grepl(paste0("B", 1:6, ".TIF$", collapse = "|"), file)) %>%
  arrange(file) %>%
  pull(file)

2.3 (Download sat files to cache)

Now that we have the URLs of the files we want, we need to download them to our cache location, using getlandsat

lsat_image will take a file name - like those your created in step 2 - and download the file to your cache. If the download has already occurred, then the path to the cached image is returned without re downloading the data.

Right now we have 6 files we want, 1 for each band. So we want to apply lsat_image, over this vector of files to return a set of local file paths. For this we can use the apply family in base R.

  • lapply returns a list of elements resulting from a specified function (FUN) applied to all inputs.

  • sapply is a wrapper of lapply that returns a vector, matrix or, if simplify = “array”, an array rather then a list

  • vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.

We want to apply the lsat_image function over our URL paths to return a vector of local file paths … so we can use sapply where the files are the input and the FUN = lsat_image

st = sapply(files, lsat_image)

s = stack(st) %>% setNames(c(paste0("band", 1:6)))
s
## class      : RasterStack 
## dimensions : 7811, 7681, 59996291, 6  (nrow, ncol, ncell, nlayers)
## resolution : 30, 30  (x, y)
## extent     : 518085, 748515, 4506585, 4740915  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=15 +datum=WGS84 +units=m +no_defs 
## names      : band1, band2, band3, band4, band5, band6 
## min values :     0,     0,     0,     0,     0,     0 
## max values : 65535, 65535, 65535, 65535, 65535, 65535

2.4 (Crop Rasterstack st )

We only want to analyze our image for the regions surrounding Palo (our AOI). We will transform our AOI to the CRS of the landsat stack and use it to crop the raster stack.

cropper = bbwgs %>%
  st_transform(crs(s))

r = crop(s, cropper)
r
## class      : RasterBrick 
## dimensions : 340, 346, 117640, 6  (nrow, ncol, ncell, nlayers)
## resolution : 30, 30  (x, y)
## extent     : 594075, 604455, 4652535, 4662735  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=15 +datum=WGS84 +units=m +no_defs 
## source     : memory
## names      : band1, band2, band3, band4, band5, band6 
## min values :  8566,  7721,  6670,  6057,  5141,  4966 
## max values : 18482, 19079, 19592, 21965, 24772, 23896

Question 3: (Image Creation)

We have loaded them as a multiband raster object in R and cropped the domain to our AOI. Lets make a few RGB plots to see what these images reveal.

115 A-C covers the notion of spectral signatures (bands) and spectral combinations in greater detail as well as common task like atmospheric correction. We dont need to worry much about correction here since we are using a LT1 product (check the processingLevel in the metadata)

3.1 (Rename raster stack with RS Band names)

Standard cameras replicate whats seen with the human eye, by capturing light in the red, green and blue wavelengths and applying red, green ,and blue filters (channels) to generate a natural looking RGB image.

With a multispectral Landsat 8 image, we have more information to work with and different wavelengths/combinations can help isolate particular features.

For example, the Near Infrared (NIR) wavelength is commonly used to analysis vegetation health because vegetation reflects strongly in this portion of the electromagnetic spectrum. Alternatively, the Shortwave Infrared (SWIR) bands are useful for discerning what is wet and dry.

When working with Landsat imagery, a logical first step is to load an image into an image analysis program (like ENVI) to visualize whats in the scene. We can do the same thing with R using the plotRGB function and selecting which band should populate each channel.

Rename raster stack with the names of the band (e.g. “coastal”, “red”, “green”, …)

par(mfrow = c(2,2))
#plot Natural
plotRGB(r, r = 4, g = 3, b = 2, axes = F, main = "Natural")
#plot color IR
plotRGB(r,axes = F, 5, 4, 3, main = "IR")
#plot false for water
plotRGB(r, 5, 7, 1, axes = F, main = "False Water Focus")
#False for Ag.
plotRGB(r, 6, 5, 2, axes = F, main = "False Ag. Focus")

3.2 (Replicate Images)

par(mfrow = c(2,2))
#stretch lin = 
plotRGB(r, r = 4, g = 3, b = 2, stretch = "lin", main = "lin")
#stretch hist = 
plotRGB(r, r = 4, g = 3, b = 2, stretch = "hist", main = "hist")

The purpose of applying a color sketch is to refine color contrast, in this example, I prefer the “lin” stretch because it groups the landscapes into relative types. The rural lands are more similar in color than in the “hist” stretch.


Question 4: (Thresholding)

Accurate assessment of surface water features (like flooding) have been made possible by remote sensing technology. Index methods are commonly used for surface water estimation using a threshold value.

For this lab we will look at 5 unique thresholding methods for delineating surface water features from different combinations of Landsat bands.

4.1 (Raster Algebra)

Raster Algebra * Create 5 new rasters using the formulas for NDVI, NDWI, MNDWI, WRI and SWI * Combine those new rasters into a stacked object set the names of your new stack to useful values * Plot the new stack, using the following palette (colorRampPalette(c(“blue”, “white”, “red”))(256)) * Describe the 5 images. How are they simular and where do they deviate?

#NDVI
ndvi = (r$band5 - r$band4) / (r$band5 + r$band4)
#NDWI
ndwi = (r$band3 - r$band5) / (r$band3 + r$band5) 
#MNDWI
mndwi = (r$band3 - r$band6) / (r$band3 + r$band6)
#WRI
wri = (r$band3 + r$band4) / (r$band5 + r$band6)
#SWI
swi = 1 / sqrt(r$band2 - r$band6)

#stack
clr_st = stack(ndvi, ndwi, mndwi, wri, swi) %>%
  setNames(c("NDVI", "NDWI", "MNDWI", "WRI", "SWI"))
plot(clr_st)

dim(clr_st)
## [1] 340 346   5
#plot
plot(clr_st, col = colorRampPalette(c("blue", "white", "red"))(256))

  • NDVI: represents ‘normalized difference vegetation index’, highlights vegetation among water and other landscape
  • NDWI: represents ‘normalized difference water index’, separates water and dry land
  • MNDWI: represents ‘modified normalized difference water index’, highlights active or passive water sources, including potential drainage/flood points
  • WRI: represents ‘water ratio index’, which clearly differentiates land and water
  • SWI: represents ‘simple water index’, which highlights only naural water sources

4.2 (Raster Thresholding)

Here we will extract the flood extents from each of the above rasters.

For this, we will use the calc function and apply a custom formula for each calculated field from step 1 that applies the threshold in a way that flooded cells are 1 and non-flooded cells are 0.

  • stack the binary ([0,1]) files into a new stack
  • set the names to meaningful descriptions
  • plot the stack so that floods are blue, and background is white.
thresholding = function(x){ifelse(x <= 0, 1, NA)}

flood = calc(ndvi, thresholding)
plot(flood, col = "blue")


Question 5: (Classification)

An alternative way to identify similar features in a continuous field is through supervised or unsupervised classification. Supervised classification groups values (cells) based on user supplied “truth” locations. Since flood events are fast-occurring there is rarely truth points for a live event. Instead developers rely on libraries of flood spectral signatures.

Unsupervised classification finds statistically significant groupings within the data. In these clustering algorithms, the user specifies the number of classes and the categorization is created based on the patterns in the data.

For this lab we will use a simple k-means algorithm to group raster cells with similar spectral properties.

5.1 (Set a seed)

Anytime we want to be able to produce a consistent/reproducible result from a random process in R we need to set a seed. Do so using set.seed

set.seed(50)

5.2 (kmeans)

  • Extract the values from your 6-band raster stack with getValues
  • Check the dimensions of the extracted values with dim
  • What do the diminsions of the extracted values tell you about how the data was extracted?
band_st = getValues(clr_st)
dim(band_st)
## [1] 117640      5
#Dimensions have increased in value, extraction may have expanded on pixel value
  • Remove NA values from your extracted data with na.omit for safety

  • Use the kmeans clustering algorithm from the stats package to cluster the extracted raster data to a specified number of clusters k (centers). Start with 12.

  • Once the kmeans algorithm runs, the output will be a list of components. One of these is cluster which provides a vector of integers from (1:k) indicating the cluster to which each cell was allocated.

  • Create a new raster object by copying one of the original bands.

#E = kmeans(band_st, 5, iter.max = 100) %>%
 # na.omit()

#kmeans_raster = flood$ndvi
  • Set the values of the copied raster to the cluster vector from the output kmeans object. For example:
#values(kmeans_raster) = kmeans_E$cluster

Try a few different clusters (k) to see how the map changes

5.3 (Table using kmeans_raster)

To identify the flood category programatically, generate a table crossing the values of one of your binary flood rasters, with the values of you kmeans_raster. To do this, you will use the table function and pass it the values from a binary flood raster, and the values from your kmeans_raster. Here the following occurs:

  • table builds a contingency table counting the number of times each combination of factor levels in the input vector(s) occurs. This will give us a table quantifying how many cells with a value 1 are aligned with each of the k classes, and how many cells with a value 0 are aligned with each of the k classes.

  • If you pass the binary flood values as the first argument to table then the unique values (0,1) will be the rows. They will always be sorted meaning you know the flooded cells will be in the second row.

  • which.max() returns the index of the maximum value in a vector.

  • combine this information to identify the cluster in the kmeans data that coincides with the most flooded cells in the binary mask.

  • Once you know this value, use calc to extract the flood mask in a similar way to the thresholding you did above.

  • Finally use addLayer to add this new layer to your flood raster stack.


Questions 6: (Summary)

Our last goal is to identify how they compare. ### 6.1 (Total Area) * we will calculate the total area of the flooded cells in each image. You can use cellStats to determine the sum of each layer. Since flooded cells have a value of 1, the sum of an entire band is equivalent to the number of flooded cells. You can then use the resolution of the cell to convert counts to a flooded area. * print values as a kable table

6.2 (Summing the stack)

  • visualize the uncertainty in our classifications by summing the entire stack using calc. The higher the count in each pixel, the more certain we can be about its flooded state. For example, if a cell has a value of 6, it indicates that every method identified the cell as flooded, if it has a value of 2 then we know that two of the methods identified the cell as flooded.
  • plot your flood map using the blues9 color palette

6.3 (Plot stack)

  • once you have a summed raster layer, copy it as a new layer, and set all 0 values to NA. Then map the raster with mapview. Zoom and pan around the interactive map noting that a pixel level is displayed in the upper right hand corner.
  • why are some of the cell values not an even number?

This kind of work goes on regularly and is part of a couple national efforts (NOAA, USGS, FirstStreet, FEMA) to generate flood inundation libraries that contribute to better extraction and classification of realtime flood events, resource allocation during events, and damage assessments post events.

Here we used landsat imagery but the same process could be implemented on drone footage, MODIS data, or other private satellite imagery.

Your evaluation was based purely on the raster data structure and your ability to conceptualize rasters as vectors of data with dimensional structure. You applied simple mathematical operators (+, /, -) to the raster bands, and a kmeans clustering algorithm to the data matrix of the multiband raster


Extra Credit: Pixel Evaluation

Use mapview to generate a slippy map of the Palo, Iowa bbox. Find the location shown in the above image using context clues and different base maps. Once you do, do the following: